Overview

Dataset statistics

Number of variables6
Number of observations843
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory39.6 KiB
Average record size in memory48.2 B

Variable types

Numeric6

Warnings

Houses - median sale price ($) is highly correlated with INCOME_36 and 3 other fieldsHigh correlation
INCOME_36 is highly correlated with Houses - median sale price ($) and 3 other fieldsHigh correlation
INCOME_21 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
INCOME_27 is highly correlated with Houses - median sale price ($) and 2 other fieldsHigh correlation
INCOME_24 is highly correlated with Houses - median sale price ($) and 2 other fieldsHigh correlation
Houses - median sale price ($) is highly correlated with INCOME_36 and 2 other fieldsHigh correlation
INCOME_36 is highly correlated with Houses - median sale price ($) and 2 other fieldsHigh correlation
INCOME_21 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
INCOME_24 is highly correlated with Houses - median sale price ($) and 1 other fieldsHigh correlation
INCOME_36 is highly correlated with INCOME_21High correlation
INCOME_21 is highly correlated with INCOME_36High correlation
INCOME_36 is highly correlated with Houses - median sale price ($) and 4 other fieldsHigh correlation
Houses - median sale price ($) is highly correlated with INCOME_36 and 4 other fieldsHigh correlation
INCOME_21 is highly correlated with INCOME_36 and 4 other fieldsHigh correlation
INCOME_27 is highly correlated with INCOME_36 and 4 other fieldsHigh correlation
INCOME_24 is highly correlated with INCOME_36 and 4 other fieldsHigh correlation
INCOME_30 is highly correlated with INCOME_36 and 4 other fieldsHigh correlation

Reproduction

Analysis started2021-08-18 03:37:38.033403
Analysis finished2021-08-18 03:37:46.299675
Duration8.27 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

Houses - median sale price ($)
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct496
Distinct (%)58.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.865771414 × 10-16
Minimum-1.389170896
Maximum8.66419486
Zeros0
Zeros (%)0.0%
Negative491
Negative (%)58.2%
Memory size6.7 KiB

Quantile statistics

Minimum-1.389170896
5-th percentile-0.9824296019
Q1-0.5965537842
median-0.1833089654
Q30.1791270642
95-th percentile1.737940717
Maximum8.66419486
Range10.05336576
Interquartile range (IQR)0.7756808484

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)3.491533356 × 1015
Kurtosis15.56378546
Mean2.865771414 × 10-16
Median Absolute Deviation (MAD)0.3996958083
Skewness3.088187381
Sum2.415845302 × 10-13
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.154624824 × 10-1678
 
9.3%
0.0605732227510
 
1.2%
0.11476926468
 
0.9%
-0.54913224768
 
0.9%
-0.49493620587
 
0.8%
-0.61687729997
 
0.8%
-0.88785750896
 
0.7%
-0.48138719536
 
0.7%
-0.41364214316
 
0.7%
-0.21040698636
 
0.7%
Other values (486)701
83.2%
ValueCountFrequency (%)
-1.3891708961
0.1%
-1.3160062391
0.1%
-1.2401317811
0.1%
-1.1994847491
0.1%
-1.1927102441
0.1%
-1.1859357391
0.1%
-1.1723867282
0.2%
-1.1696769261
0.1%
-1.1642573221
0.1%
-1.1588377181
0.1%
ValueCountFrequency (%)
8.664194861
0.1%
7.7293131391
0.1%
5.5343734451
0.1%
5.0330600591
0.1%
4.8569229231
0.1%
4.7268524231
0.1%
4.3637389421
0.1%
4.2146998271
0.1%
4.0778548221
0.1%
4.0575313061
0.1%

INCOME_36
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct701
Distinct (%)83.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.68574789 × 10-17
Minimum-1.99373369
Maximum7.378438997
Zeros0
Zeros (%)0.0%
Negative558
Negative (%)66.2%
Memory size6.7 KiB

Quantile statistics

Minimum-1.99373369
5-th percentile-1.212694667
Q1-0.5910079861
median-0.0005273952406
Q30.303651045
95-th percentile1.928929517
Maximum7.378438997
Range9.372172688
Interquartile range (IQR)0.8946590312

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)5.935606705 × 1016
Kurtosis10.82464248
Mean1.68574789 × 10-17
Median Absolute Deviation (MAD)0.4660503543
Skewness2.377277901
Sum1.421085472 × 10-14
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5.004564049 × 10-18136
 
16.1%
-0.41069674972
 
0.2%
-0.2256701142
 
0.2%
-0.25051393382
 
0.2%
-1.3655299052
 
0.2%
-0.46545168412
 
0.2%
-1.0783832062
 
0.2%
0.89075343952
 
0.2%
-0.5901634371
 
0.1%
-0.74014128441
 
0.1%
Other values (691)691
82.0%
ValueCountFrequency (%)
-1.993733691
0.1%
-1.6152349281
0.1%
-1.5053027841
0.1%
-1.4892563511
0.1%
-1.4827814741
0.1%
-1.4802478271
0.1%
-1.4657497341
0.1%
-1.4489291311
0.1%
-1.4201440821
0.1%
-1.4178919511
0.1%
ValueCountFrequency (%)
7.3784389971
0.1%
7.0364669841
0.1%
6.1634143341
0.1%
5.709821081
0.1%
5.4559636911
0.1%
4.1413526111
0.1%
3.7114771111
0.1%
3.6611560591
0.1%
3.6007004181
0.1%
3.3514176711
0.1%

INCOME_30
Real number (ℝ)

HIGH CORRELATION

Distinct777
Distinct (%)92.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-2.191472257 × 10-16
Minimum-2.573652203
Maximum5.857210978
Zeros0
Zeros (%)0.0%
Negative479
Negative (%)56.8%
Memory size6.7 KiB

Quantile statistics

Minimum-2.573652203
5-th percentile-1.49310825
Q1-0.6217987022
median-0.005499171865
Q30.4567254759
95-th percentile1.756835074
Maximum5.857210978
Range8.430863181
Interquartile range (IQR)1.078524178

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)-4.565851311 × 1015
Kurtosis2.720552367
Mean-2.191472257 × 10-16
Median Absolute Deviation (MAD)0.543551877
Skewness0.9773246939
Sum-1.847411113 × 10-13
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2.396922782 × 10-1652
 
6.2%
-0.25055330652
 
0.2%
-0.046788380542
 
0.2%
-0.54958727242
 
0.2%
-0.61858136132
 
0.2%
-0.57782837612
 
0.2%
0.13552760582
 
0.2%
-0.52706588582
 
0.2%
-1.0505987532
 
0.2%
-0.0155086772
 
0.2%
Other values (767)773
91.7%
ValueCountFrequency (%)
-2.5736522031
0.1%
-2.3613077021
0.1%
-2.2045517021
0.1%
-2.1948996791
0.1%
-2.1593301871
0.1%
-2.0930172161
0.1%
-2.0923022511
0.1%
-2.0801478521
0.1%
-1.9380486271
0.1%
-1.9212469581
0.1%
ValueCountFrequency (%)
5.8572109781
0.1%
4.3282590691
0.1%
3.813663261
0.1%
3.7929292851
0.1%
3.7714803461
0.1%
3.6713852941
0.1%
3.3728875521
0.1%
3.3366030961
0.1%
3.2123779871
0.1%
3.1937889071
0.1%

INCOME_21
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct734
Distinct (%)87.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-9.861625158 × 10-16
Minimum-2.09110046
Maximum4.667927675
Zeros0
Zeros (%)0.0%
Negative525
Negative (%)62.3%
Memory size6.7 KiB

Quantile statistics

Minimum-2.09110046
5-th percentile-1.370658788
Q1-0.6500172727
median-0.02854987672
Q30.4760964198
95-th percentile1.752144281
Maximum4.667927675
Range6.759028135
Interquartile range (IQR)1.126113692

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)-1.014633625 × 1015
Kurtosis2.777344673
Mean-9.861625158 × 10-16
Median Absolute Deviation (MAD)0.5723321548
Skewness1.213443381
Sum-8.313350008 × 10-13
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1.264388796 × 10-1599
 
11.7%
-1.0055200832
 
0.2%
-0.6655268052
 
0.2%
-0.49409651182
 
0.2%
0.55403507822
 
0.2%
0.23272274962
 
0.2%
0.31613536062
 
0.2%
-0.39460959562
 
0.2%
0.64969891642
 
0.2%
-0.089024019692
 
0.2%
Other values (724)726
86.1%
ValueCountFrequency (%)
-2.091100461
0.1%
-1.9874429131
0.1%
-1.7384215141
0.1%
-1.6848984221
0.1%
-1.6615255131
0.1%
-1.6612648491
0.1%
-1.6470151941
0.1%
-1.6231209571
0.1%
-1.6086975261
0.1%
-1.596880741
0.1%
ValueCountFrequency (%)
4.6679276751
0.1%
4.3698144791
0.1%
4.3142929591
0.1%
4.2133289451
0.1%
4.1703193171
0.1%
4.126353921
0.1%
3.9857689151
0.1%
3.7567317881
0.1%
3.4126547671
0.1%
3.3418409361
0.1%

INCOME_27
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION

Distinct764
Distinct (%)90.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-1.264310918 × 10-16
Minimum-1.392692792
Maximum10.08686681
Zeros0
Zeros (%)0.0%
Negative552
Negative (%)65.5%
Memory size6.7 KiB

Quantile statistics

Minimum-1.392692792
5-th percentile-0.9595430277
Q1-0.5853927909
median-0.1674990894
Q30.1851011737
95-th percentile1.7085101
Maximum10.08686681
Range11.4795596
Interquartile range (IQR)0.7704939645

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)-7.914142273 × 1015
Kurtosis21.17458657
Mean-1.264310918 × 10-16
Median Absolute Deviation (MAD)0.4071422579
Skewness3.481790033
Sum-1.065814104 × 10-13
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-1.427500873 × 10-1644
 
5.2%
-0.09843872183
 
0.4%
-0.4162733682
 
0.2%
-0.23483294772
 
0.2%
-0.31252586132
 
0.2%
-0.69377048132
 
0.2%
-0.34470171432
 
0.2%
-0.57464134732
 
0.2%
-0.62047231852
 
0.2%
0.15990983512
 
0.2%
Other values (754)780
92.5%
ValueCountFrequency (%)
-1.3926927921
0.1%
-1.2771736321
0.1%
-1.2321274381
0.1%
-1.2008933171
0.1%
-1.1851977791
0.1%
-1.1715426611
0.1%
-1.1531788811
0.1%
-1.1461158891
0.1%
-1.1150387241
0.1%
-1.1126843931
0.1%
ValueCountFrequency (%)
10.086866811
0.1%
7.0957681141
0.1%
6.5522316311
0.1%
6.018426381
0.1%
5.3636085311
0.1%
5.0712006571
0.1%
4.6576232281
0.1%
3.6812038041
0.1%
3.632076771
0.1%
3.6000578721
0.1%

INCOME_24
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct785
Distinct (%)93.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-4.425088212 × 10-16
Minimum-2.921626255
Maximum12.29622818
Zeros0
Zeros (%)0.0%
Negative574
Negative (%)68.1%
Memory size6.7 KiB

Quantile statistics

Minimum-2.921626255
5-th percentile-0.9774474735
Q1-0.450091947
median-0.1761817045
Q30.1372121406
95-th percentile1.540036312
Maximum12.29622818
Range15.21785444
Interquartile range (IQR)0.5873040875

Descriptive statistics

Standard deviation1.000593648
Coefficient of variation (CV)-2.261183507 × 1015
Kurtosis35.24980894
Mean-4.425088212 × 10-16
Median Absolute Deviation (MAD)0.2888761134
Skewness4.272803734
Sum-3.730349363 × 10-13
Variance1.001187648
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-5.627444061 × 10-1636
 
4.3%
0.00093377059592
 
0.2%
-0.24563572062
 
0.2%
-0.041140822632
 
0.2%
-1.1472230982
 
0.2%
-0.15522175092
 
0.2%
-0.44765564242
 
0.2%
-0.35097688962
 
0.2%
0.039373242712
 
0.2%
-0.67991667822
 
0.2%
Other values (775)789
93.6%
ValueCountFrequency (%)
-2.9216262551
0.1%
-2.340316251
0.1%
-1.8380507941
0.1%
-1.8001527231
0.1%
-1.7211081741
0.1%
-1.7207214591
0.1%
-1.6829007311
0.1%
-1.602231981
0.1%
-1.5086469471
0.1%
-1.4733011951
0.1%
ValueCountFrequency (%)
12.296228181
0.1%
6.9376729561
0.1%
6.0227825821
0.1%
5.6199028841
0.1%
5.1775782541
0.1%
5.0123736011
0.1%
4.4860544711
0.1%
4.4312956251
0.1%
4.2881337281
0.1%
3.9541666441
0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Houses - median sale price ($)INCOME_36INCOME_30INCOME_21INCOME_27INCOME_24
03.154625e-16-5.273952e-040.6933791.689469e-01-6.159206e-01-0.346878
12.638084e-01-5.004564e-182.0319711.201004e+002.441949e-01-0.679917
2-4.813872e-01-1.292054e+000.488184-1.647015e+00-9.843872e-02-0.222974
3-5.288087e-01-1.229206e+001.133797-1.202495e+00-1.427501e-16-0.908079
4-4.136421e-01-8.540850e-011.165970-1.112306e+00-1.987332e-010.590829
5-4.312559e-01-1.094007e+000.937182-1.264389e-15-1.131925e-01-0.209594
6-3.052501e-01-1.031581e+000.696596-1.099881e+009.555813e-02-0.206113
73.154625e-16-4.973334e-01-0.293809-5.356290e-01-4.181568e-01-0.213229
83.154625e-16-2.086384e-01-0.118642-1.693955e-01-2.396986e-010.337066
93.154625e-16-2.874630e-01-0.719749-2.254384e-01-4.958497e-01-0.098065

Last rows

Houses - median sale price ($)INCOME_36INCOME_30INCOME_21INCOME_27INCOME_24
8330.0714121.022433e+003.092979-1.264389e-15-0.578408-3.856266e-01
834-0.0071724.073898e-011.7120258.414610e-01-0.568677-5.627444e-16
835-0.312025-6.830246e-020.5317972.088947e-02-1.0307549.451880e-02
8360.6025341.963894e+003.3728882.229325e+000.7251061.698303e+00
8370.0402509.886508e-012.2966871.424394e+00-0.425848-3.552308e-01
838-0.0613686.337994e-012.3399421.112986e+00-0.623141-2.776557e-01
839-0.1155643.566465e-011.7783388.516269e-01-0.755297-2.012408e-01
8400.4941421.155520e+002.541562-1.264389e-150.2201811.333786e+00
8410.345102-5.004564e-182.3176009.330411e-010.1056037.374712e-01
8420.166256-5.004564e-182.2950781.252355e+00-0.2348334.486724e-01